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ABSTRACT 


Recent research in the area of database machines has 
been directed at achieving greater efficiency and increasing 
user-friendliness. This thesis is concerned with the second 
of these research directions, increasing user-friendliness. 
One development toward increased user-friendliness is’ the 
growing acceptance of the relational data model and rela- 
tional query languages. Relational interfaces provide’ the 
user with an easy-to-understand data representation and 
language with which to manipulate the data. 

This thesis presents the design and analysis of a_ rela- 
tional query language interface, using the SQL relational 
query language, for the Multi-Backend Database System 
(MDBS), a database machine which uses the attribute-based 
model. The purpose is two-fold: first, to provide the user 
with an eaSier-to-understand language-to-machine interface, 
thereby making MDBS available to the wider community of 
relational database uSers; second, to investigate how the 
attribute-based model may be used to support’ relational 


databases. 
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I. INTRODUCTION 


The rapid growth in the use of database management Sys- 
tems (DBMSs) has stimulated research to produce more effi- 
cient and eaSier-to-use systems. Greater efficiency is re- 
quired in order to offset the inherent costs of operating a 
DBMS. One area of research directed at producing greater 
efficiency is in the development of database machines. Da- 
tabase machines use specially configured hardware, tailor- 
made software, and innovative techniques such as masSive 
parallelism to support higher capacity and performance. 
Greater ease of use is necessary in order to ensure a wider 
distribution of use. The emergence of database systems  us- 
ing the relational model of data is an important development 
in this area. 

One of the database machines of interest is the Multi- 
Backend Database System (MDBS). The idea of MDBS is to use 
general-purpose hardware and special-purpose software in a 
novel configuration to provide a backend database machine 
solution. The design and development of MDBS is an = ongoing 
project [Ref. 1 and 2]. In this thesis, we will not examine 
the particular database machine solution to the efficiency 
problem. Rather the contribution of this thesis to the MDBS 


research is in the area of ease of use. We will determine 








how the relational query language of SQL [Ref. 3] can be 
Supported by the attribute-based query language of MDBS 
[Ref. 1}. In the next two sections, a brief review of the 
design goals of MDBS and of the development of relational 
query languages is presented. In the final section of the 


chapter, the organization of the thesis is discussed. 


A. DESIGN GOALS FOR EFFICIENCY IN MDBS © 

As previously mentioned, research in database machines 
has been driven by the need to develop more efficient sys- 
tems. Efforts have resulted in a variety of machines which 
include: CASSM [Ref. 4 and 5], RAP [Ref. 6], DBC [Ref. 7 and 
Sl DIRECT [Ref. 9], MDBS [Ref. 1 and 2], RDBM ([Ref. 10], 
Meo (Ref. ll], DBMAC [{Ref. 12], and IDM [Ref. 13]. This 
is not a complete listing, but does illustrate the fact that 
no "“best" architecture has been developed. Each of the 
machines listed are unique. This uniqueness makes classifi- 
cation impossible. However, while no true taxonomy of data- 
base machines exists, Strawser [{Ref. 14] cites several 
design issues that can be used to categorize the systems. 
Three of these issues, processor structure, interconnection 
of the processor and the database store, and alternative 
physical organizations have particular relevance to the MDBS 


design. Within each of the issues there exist tradeoffs 


10 








that affect the performance of the various machines. What 
follows is a brief description of these three design issues, 
and the MDBS solutions. 

Many database machines are organized with a single con- 
trol processor and one or more slave processors. As in any 
such system, the control processor is a potential 
bottleneck. Some deSigns seek to alleviate this problem by 
having the control processor perform .only administrative 
tasks, or by otherwise limiting its responsibility. At the 
other end of the spectrum, some machines permit the control 
processor to participate in query execution. Irrespective 
of the design chosen, throughput will be inversely propor- 
tional to the amount of work levied on the control proces- 
SOL. A goal of the MDBS design is to minimize the poten- 
tial control processor bottleneck. The control processor 
performs a minimal set of functions, only those which are 
necessary to administer query execution. 

Additional differentiation of processor structures’ can 
be made between homogeneous and heterogeneous multiprocessor 
organizations. Homogeneous organizations use a number of 
processors with identical functionality. This allows for a 
high degree of intra-query parallelism. The heterogeneous 
organization is characterized by a number of processors with 


Specialized functionality, thus permitting ise Geer y 


ll 








parallelism. MDBS uses a homogeneous multiprocessor organi- 
zation, offering a high degree of intra~query parallelism. 
The software in the backend processors is identical, allow- 
ing eaSy expansion of the system by replicating the software 
when new backend processors are added. The backend proces- 
Sors operate in parallel. However, the backends also 
operate independently. Each backend has a separate schedul- 
ing mechanism, to make the optimum uSe of resources. Com- 
munication between the processors 1S via a broadcast bus, to 
minimize communication overhead. 

There are two major categories to describe the intercon- 


nection of the processor and the database store. The first, 


direct interconnection, connects the processor directly to 


the database store. While this method has an advantage in 
that the processors never have to wait for data, it suffers 
in two respects. The processor must be able to work at 
speeds equivalent to the transfer rate of the secondary 
Storage devices, and data sharing among processors is 


severely limited. The second major category is the 


hierarchical interconnection. This method, which is more 


prevalent, transfers data from the database store to RAM 


Storage for access by the processors. In MDBS, each backend 


eZ 








processor has dedicated disk drives, eliminating contention 
between processors for the same device. Data is staged from 
the disk to buffers in the main memory. 

Like other deSign issues, the motivation for seeking al- 
ternative physical organizations is to enhance performance. 
Two such designs are the data pool organization used in 
DBMAC [Ref. 12], and the V-Relation scheme used in VERSO 
{Ref. ll]. However, the gains realized from these organiza- 
tions apply only to some operations. MDBS uses a cluster- 
ing methodology to organize the database. Records in the 
database are divided into clusters based on attribute 
values. The clusters of the database are then Spread across 
the backends, so that the advantages of parallelism are 
realized for all operations. In other words, for database 
access, MDBS attempts to achieve 


record-serial-cluster-parallel operations. 


B. RELATIONAL QUERY LANGUAGE DEVELOPMENTS 

Each successive generation of database languages has 
Sought to make data manipulation more user-friendly. The 
idea is to remove from the user any responsibility for 
knowing the particularities of system structure. Early 
representations of databases, first the hierarchical model 


and then the network model, require the user to understand 
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the organization of the database in order to navigate 
through it for the purpose of storage, retrieval and update 
of the user data. The relational database approach attempts 
to present the user with an easSy-to-understand tabular 
representation of the stored data which makes the _ storage, 
retrieval and. update operations aS simple as_ table 
manipulation. 

Codd [Ref. 15] first proposed tuple relational calculus 
as a benchmark for evaluating data manipulation languages 
based on a relational model. The mathematical concept 
underlying the relational model is the set theoretic rela- 
tion, which is a subset of the Cartesian product of a list 
of domains. A relation is any subset of the Cartesian pro- 
duct of one or more domains. Conceptually, a relation can 
be viewed as a Simple, two-dimensional table that has 
several properties. First, the entries in the table are 
Single-valued; neither repeating groups nor arrays are al- 
lowed. Secondly, the entries in any column are all of the 
same kind, that is each column has a domain of values that 
can appear in the column. Each column has a unique name and 
the order of the columns is immaterial. In the relational 
model columns are referred to as attributes. 

The advantages inherent in the relational model are that 


no artificial constructs such aS sets or pointers are 
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required, and that the data is represented in tabular (rela- 
tional) form in a way that is familiar to the user. Opera- 
tions on the data are specified logically or symbolically by 
relational algebra or calculus. This is of major importance 
in that while the data structure is predefined, the record 
relationships are not defined until they are used. Conse- 
quently, any relationship that can be expressed in relation- 
al algebra or calculus can be used. ‘Among the advantages 
cited for relational DBMSs is increased productivity in ap- 
plications development, due EO the simplicity and 
flexibility of the model and the relational query languages. 

The importance of the relational model in regards’ to 
this paper is not in its implementation, but rather the log- 
ical representation it offers to the uSer. This representa- 
tion is developed through the use of relational query 
languages like SQL. SQL, earlier called SEQUEL, was first 
introduced by Chamberlin [Ref. 16] to be used with the 
relational model. It was another attempt to provide’ the 
user with an English~like language with which he could 
construct and manipulate his database. Developments7~ and 
changes to the language grew out of IBM System R research 


[Ref. 3]. 
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As pointed out by Hsiao [Ref. 17] the relational model 
suffers in its lack of solutions to the problems of database 
transformation and query translation. Conversely, any rela- 
tional database may be transformed, in a straightforward 
way, into the attribute-based database used by MDBS. There- 
fore it is practical to think in terms of a relational data- 
base implemented on MDBS. Developing a relational query 
language interface to MDBS has eeveral advantages. First, 
we provide an easy-to-use interface which afford the user 
the productivity increase claimed for relational query 
languages. Second, by choosing to implement the interface 
for SQL, the most widely used relational query language, we 
provide homogeneity for a wide community of database system 
users. Third, we identify those areas in MDBS where 
enhancements must be made in order to provide ae full 


relational language capability. 


SaORGANIZATION OF THE THESIS 

In Chapter 2 an overview of the organization of the 
multi-backend system is presented. Chapter 3 describes the 
MDBS query language. The SQL query language is described in 
Chapter 4. Chapter 5 explains the mappings from SQL to the 
MDBS query language. Chapter 6 offers recommendations for 
implementation, and Chapter 7 summarizes the conclusions 


obtained from the research experience. 
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II. ORGANIZATION OF THE MULTI-BACKEND SYSTEM 


An understanding of the organization of the multi- 
backend database system is helpful in understanding some of 
the design considerations of the MDBS query language. Fig- 
ure l is a representation of the MDBS hardware organization. 
The system iS comprised of a controller and a number of 
backends, all general-purpose minicomputers. A broadcast 
bus connects the controller and the backends. Each backend 
has a dedicated number of disk drives. 

The major design goals of MDBS are to allow the database 
to grow and the rate of requests to increase while maintain- 
ing good overall performance. To obtain these goals’ the 
multi-backend database system should have the following 
properties: 

(1) Throughput improvement is proportional to the 

multiplicity of backends; 

(2) Response time is inversely proportional to the 

multiplicity of backends; 

(3) The system is extensible for future growth 

and/or performance improvement; 
These properties are obtained through various MDBS_ design 
considerations. MDBS seeks to minimize the potential of the 


control processor to be a bottleneck by minimizing the 
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controller functions. Accordingly, MDBS is viewed in terms 
of controller functions and backend functions, as depicted 
in Figure 2. Each backend is responsible for conducting its 
own operations, including queueing and scheduling OF: re- 
quests. Identical operating software is maintained at each 
backend. Expansion of the system is accomplished by 
replicating this software in additional backends. 

The database is distributed across all the backends via 
the clustering mechanism, explained in the next section of 
this chapter. Requests from the controller are broadcasted 
to all the backends at the same time for procesSing. This 
allows for parallel processing of requests. RequestS are 
queued at each backend. To permit continuous execution of 
requests each backend schedules request execution indepen- 
dently. The addition of more backends results in an increase 
in parallel processing of requests, which improves 
throughput and response time. 

In the next three sections, descriptions of the 
attribute-based data model, the functions of the controller 


and the functions of the backend are presented. 


A. THE ATTRIBUTE-BASED DATA MODEL 
The data model used in MDBS is the attribute-based model 


developed by HSiao and Harary [Ref. 17]. In their work they 
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use the set A to represent attributes and the set V_ to 
represent values. A record is then defined to be a subset 
of the Cartesian product A x V, where each attribute has one 
and only one value. This way the record, R, 1s a set of 
ordered pairs of the form (an attribute, its value). 

For each record R, a set of its attribute-value obairs 
which collectively characterize R is formed. These sets of a 
record collection form an index. These ordered pairs in the 
index are called the keywords. inemeindex ~~ is used to 
identify a record or a set of records. 

Following the keywords is the record body, which is a 
String of characters not used by MDBS for Search purposes. 
An example of a record index without a following body is 
Shown below. 

(<FILE,employee>,<NAME,Smith>,<CITY,Monterey>, <RANK, 3>) 
The first attribute-value pair in all records of a file are 
the same, since it designates the file name. In the example 
above the file name is "employee". 

In order to enhance the performance of the system, 
records are logically grouped into clusters. The clustering 
is determined by the attribute values and attribute value 


rangeS in the records. In the example above, records could 
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be clustered on the NAME attribute, with all employees 
having a last name starting with the letter 'S' clustered 
together. 

Keyword predicates are used in the data manipulation 
language for search and retrieval purposes. The keyword 
predicate has the form (attribute, relational operator, 
value). For example, 

(SALARY > 2000) | 

is a simple greater-than predicate. A keyword is_ said _ to 
Satisfy a predicate if the attribute of the keyword is 
identical to the attribute of the predicate and the relation 
specified by the relational operator of the predicate holds 
for the value of the keyword and the value of the predicate. 
For example, the keyword <RANK,4> satisfies the predicate 
(RANK < 6). 

A conjunction is simply a conjunction of predicates, 

such as: 

(SALARY > 10000) A (RANK = 3) 
Eumeeecorad Satisflies a query pomunecion. if the record 
contains keywords that satisfy every predicate in the 
conjunction. A query is a boolean expression of predicates, 


such as: 


((DEPT = Sales) A’ (SALARY < 10000)) V 


((DEPT = Sales) A, (SALARY > 15000)) 
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B. IMPLEMENTATION OF THE ATTRIBUTE-BASED MODEL IN MDBS 

The indices of the attribute-based model are 
implemented in MDBS as descriptors. Descriptors are defined 
for designated directory attributes. The rules of 
definition require that the descriptors for a directory 
attribute form a partition over the domain of the attribute. 

Clusters result from the partitioning of the database 
according to the descriptor definitions. A record belongs 
to the cluster defined by the set of descriptors which can 
be derived from the keywords of the record. 

The clustering mechanism provides an ideal vehicle for 
distributing data across the backends of MDBS to take full 
advantage of parallelism. The records of a cluSter are 
distributed track-at-a-time across all the backends. 
Therefore the work of query execution can be shared across 
the backends, with each backend processing the query against 
its portion of the relevant cluster(s). For amore detailed 
explanation of the clustering mechanism, readers are 


referred to [Ref. l]. 
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fee FUNCTIONS OF THE CONTROLLER 

It is important to reiterate that a basic design 
conSideration of MDBS is to minimize the functions of the 
Comcroller. These functionS are divided into three 
categories: request preparation, insert lmeeormat2on 
generation, and post processing. The request preparation 
functionS comprise the necesSary operations performed on a 
request prior to its broadcast to the backends. These 
functions include parsing and syntax checking. InSert 


information generation functions are performed during’ the 


processing of an inSert request in order to supply 
additional information needed by the backends. Post 
Bueeessing functions are performed after replies are 


returned from the backends. For example, these functions 
perform housekeeping duties on the Separate responses to the 
Single broadcast request, that is, the collection of the 


data prior to transmission to the host machine. 


Pee UNCTIONS OF THE BACKEND 

Functions within each backend are divided into three 
categories: directory management, record processing, and 
concurrency control. The directory management function is 
further divided into descriptor search, cluster search, 


address generation, and directory table maintenance. [It is 
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responsible for searching through the descriptors’ and 
clusters to determine the disk addresses for the records’ to 
be accessed. The record processing functions include: record 
Storage, record retrieval, record selection, and attribute 


value extraction of the retrieved records. Concurrency 


control is maintained by the locking of clusters to prevent 


conflicting access to the same clustered data. 

Figure 3 is a representation of the operations performed 
on a uSer- request. A request iS Submitted to the host, 
which converts it to the internal form of the MDBS 
environment. The controller parses the request and checks 
for syntax errors, then broadcasts the request to all of the 
backends. The work of descriptor search is shared among all 
backends. Each backend does its portion of descriptor 
search, and broadcasts its findings to all the other 
backends. When all descriptors have been identified, each 
backend independently performs cluster Search. The 
appropriate records are then selected, values extracted and 
the results sent back to the controller. The controller 
collects the results from all the backends, performs any 
final aggregation required, and forwards the data to the 


host. 
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III. THE MDBS- QUERY LANGUAGE 

The query language for MDBS is a non-procedural language 
in which queries are expressed in the disjunctive normal 
Boemee ine language itself supports four different types of 
requests: retrieve, insert, delete, and update. Appendix A 
is a formal specification of the language. In the examples 
below, reserved words are capitalized and optional portions 


of queries are enclosed in brackets. 


elas RETRIEVE REQUEST 
The RETRIEVE is the most flexible of the operations on 
the database. It is the user's vehicle to query the data- 
base for information. Unlike the other three operations, 
the retrieve does not alter the contents of the database. 
Its syntax is: 
PPateve Ouery Target list [BY attribute] (WITH pointer]. 
As shown above, the RETRIEVE request consists of five parts. 
The last two parts, those enclosed in square brackets, are 
optional. The operator RETRIEVE is a reserved word and 
indicates the type of request. The Guery is made up of 
predicates in the disjunctive normal form. The query 
defines the portion of the database which is to. be 


retrieved. The SaugcterstertSwemen LISty of attributes for 
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which values are to be extracted from the records which 
Satisfy the query. The attribute value may be a value _ from 
the record, or an aggregate of values from multiple records. 
Five aggregate operators are supported in MDBS: AVG, COUNT, 
MAX, MIN, and SUM. 
iivemsy Clause performs an ordering on the data returned. 
For example, to RETRIEVE all the employees names ordered 
according to department, the following query can be used. 
RETRIEVE(FILE = Employee) <NAME> BY DEPT 
The WITH clause specifies whether pointers to the 
retrieved records must be returned to the user for later use 
in an update request. 
Let us examine some examples of RETRIEVE requests. 
Example 1. Retrieve the names of all employees who make 
less than $10,000. 
RETRIEVE((FILE = Employee) A (SALARY < 10000) ) <NAME> 
Example 2. Retrieve the average salary of employees who 
have a rank greater than 2, order by depart- 
ment. | 
RETRIEVE((FILE = Employee) A (RANK > 2)) 


<AVG (SALARY)> BY DEPT 
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Eee tH& INSERT REQUEST 

The INSERT request is used to add records to the data- 
base. The syntax is: 

INSERT Record 
where record is the record to be inserted into the database. 
An example of an INSERT request is: 
INSERT (<FILE,Employee>,<NAME,Smith>,<SALARY,10000>) 

This creates a record an the employee file for Smith and 


sets his salary at 10000. 


See tHE DELETE REQUEST 

The DELETE request is used to remove records from the 
database. The syntax is; 

DELETE Query 
where query is of the same form as that used in the RETRIEVE 
request. An example is: 
DELETE((FILE = Employee) A (NAME = Smith) ) 

This deletes all records in the employee file for employees 


named Smith. 


D. THE UPDATE REQUEST 


The UPDATE request is used to modify values for records 
which already exist in the database. The syntax is: 
UPDATE Query Modifier 


mere the query specifies the particular records to be 
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modified and modifier indicates the type of modification 
that is to be performed. MDBS allows five types of 
modifications. 
ties I YPE-O modification sets the new value of the attri- 
bute being modified to a constant. An example of a TYPE-O 
meaditication is: 
UPDATE ((FILE = Employee) A (NAME = Smith) ) 
<SALARY = 50005 
This sets the salary of all employees named Smith to 5000. 
DmEncmnxP eo) modliiication, the new value of the attri- 
bute is set to some function of the old value of the attri- 
bute in the record being modified. An example of a TYPE-I 
Ne@dification is: 
UPDATE ((FILE = Employee) A (NAME = Smith) ) 
<SALARY = 2 * SALARY> 
This doubles the salary of all employees named Smith. 
Some MoOdbelCatwon sets the new valuevof the at- 
tribute to some function of another attribute contained 
within the same record. Where a TYPE-I modification was a 


function of the same attribute, the TYPE-II modification 


looks at another attribute to derive a value. An example of 
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emer PE—-II modification is: 
UPDATE ((FILE = Employee) A (NAME = Smith) ) 
<SALARY = 8 * RANK> 

This makes all the salaries of employees with the last name 
Smith equal to eight times the value of their rank. 

MhestyYPE-Ilil modifier derives the new paae SEmene at— 
tribute being modified from some function of another attri- 
bute value contained in another record which is’ identified 
by the query in the modifier. An example of a TYPE-III 
Modification is: 

UPDATE((FILE = Employee) A (NAME = Smith) )<SALARY = 
SALARY of (FILE = Positions) A (JOB = Manager) > 

Here employees named Smith get their salary set to that of a 
manager's, as recorded in the Positions file. 

see lYPE-IV modifier derives the new value of the attri- 
bute being modified from a function of another attribute 
value in another record identified by the pointer in_= the 
modifier. This requires a retrieval request first in order 
to obtain the value for the pointer. An example of a 
TYPE-IV modification is: | 

RETRIEVE((FILE = Employee) \ (NAME = Jones)) with Pointer 
The retrieve request returns the value of a pointer, in this 
example, say, 2000. So we can then execute the following 


update request. 
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UPDATE( (FILE = Employee) A\ (NAME = Smith) ) 


<SALARY = SALARY of Pointer> 
The effect of these two queries is that all employees with 


the name Smith have their salary set to that of Jones. 


Bi 


IV. THE RELATIONAL QUERY LANGUAGE, SQL 

Data in the relational data model is depicted aS a _ two- 
dimensional table. The relational query language, SQL 
attempts to exploit this representation. It does this by 
providing an English-like language that allows the user to 
list the attributes from a relation meeting the user's 
Selection requirements. For amore complete description, 
the reader is referred to [{Ref. 3 and 16]. 

Various implementations of SQL provide many functions 
and facilities beyond the basic SQL. The four basic con- 
StructsS are: Select, insert, delete, and update. However, 
in illustrating the use of the basic constructs, we include 
Some other functions and facilities. In particular, some of 
the examples and conStructs Shown below are those imple- 
mented by the Oracle Corporation ([Ref. ea database 


management system. 


Peeeiae SELECT REQUEST 
The SELECT request is used for retrieval of data from 
the database. Its general form is as follows. 
SeEECT A ,peoe vA 
FROM R 


WHERE B @ b AND ... AND BQ b 
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where A and B are attributes found in the relation R, is a 
relational operator (Such as >, <, =, =, >, < ),, and bis a 
constant. In particular, B d b is termed a predicate. 
Within the general guidelines above, SQL offers a great deal 
of latitude in the formation of the SELECT query. Let us 
look at each clause separately, the SELECT clause, the FROM 
line, and the WHERE line. 

Instead of listing the attributes to be retrieved on the 
SELECT clause the user may request the return of the entire 
relation by using the wild card character, ‘'*', SQL also 
allows for the use of aggregate operators (such as AVG, SUM, 
MAX), arithmetic operators (such as +, — , / ), and arith- 
metic functions (such as ROUND, TRUNC). Additionally, SQL 
permits the user to define the format for the retrieved 
data. These are only some of the basic variations 
permitted. Examples of these options follow: 

Example 1. Retrieve all the attributes for all the 


employees. (Use of the wildcard.) 


SRbre i 


FROM Employee 
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Example 2. 


Example 3. 


Example 4. 


Example 5. 


Obtain the average salary of all the 


employees. (Use of an aggregate operator.) 


SELECT AVG (Salary) 


FROM Employee 


Obtain the total of the salary and com- 
mission for each employee. (Use of an 


arithmetic operator.) 


SELECT Salary + Commission 


FROM Employee 


Retrieve the Salaries of all the employees, 
rounded to the nearest dollar. (Use of an 


arithmetic function.) 


SELECT ROUND (Salary) 


FROM Employee 


Retrieve the dates of hiring for all the em- 
ployees, and format them to read 
month/day/year (ex. 09/24/50). 


(Use of a format option.) 
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SELECT TOVCHAR(Hiredate, “MM/DD/YY") Hiredate 


FROM Employee 


The FROM line identifies the relation or relations’ from 
which data is to be retrieved. A Single relation is speci- 
fied for simple retrievals. Two or more relations are 
Specified for join operations. 

An example of a simple SELECT on a Single relation is as 


follows. 


Example 6. Return the names of all the employees. 
SELECT Name 


FROM Employee 


An example of a join, involving two relations in this’ case, 


1s as follows. 


Example 7. Return all the names and locations of the de- 
partments which have an employee named Smith. 
SELECT Name, Location 
FROM Employee, Department 
WHERE Name = Smith 
The WHERE line establishes the conditions on which the 
retrieval is to be made. Predicates are used to qualify the 
selection of tuples from the relations(s). Only those 


tuples which satisfy the predicates are selected. Like the 
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BebeCT line it has many variations. These variations in- 
clude: an attribute of the relation compared to some con- 
Stant, the testing of an attribute for set membership, the 
use of boolean operators to create complex conditions, and 
the ability to nest additional SELECT clauses in order. to 
extract values for comparison. The following are examples 


of some of these variations. 


Example 8. Retrieve the names and salaries of all the 
employees that have a salary equal to 10000. 


(Comparison of an attribute to a constant.) 


SELECT Name, Salary 
FROM Employee 


WHERE Salary = 10000 


Example 9. Obtain the names of the employees whose jobs 
are either a clerks, analysts, or managers. 


(A test for set inclusion.) 


SELECT Name 
FROM Employee 


WHERE Job IN (Clerk,Analyst,Manager) 
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Example 10. List the names of all the employees that 
have a salary equal to 10000 and are named 


Smith. (A logical AND operation.) 


SELECT Name 
FROM Employee 


WHERE Salary = 10000 AND Name = Smith 


Example ll. List the name and job of employees who have 


the same job as Smith. (A nested SELECT.) 


SELECT Name, Job 

FROM Employee 

WHERE Job = 
(SEENCT Job 
FROM Employee 


WHERE Name = Smith) 


Beeld INSERT REQUEST 
The INSERT request is used to create rows (tuples) Iai 
relation (table) and has the general form: 
INSERT INTO R 
MeGUES (V ,ece,V) 
where R is the relation name and V is a value. The order 
in which the data values are listed in the INSERT must 


correspond to the order of the columns in the table. An 
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example of an INSERT is as follows. 

INSERT INTO Employee 

VALUES (Smith,2,10000) 
This example creates a new tuple within the employee rela- 
tion. Assuming that the Employee relation has attributes 
name, rank, and Salary, a new tuple is created with the name 


being Smith, the rank being 2, and the salary being 10000. 


fee CHE DELETE REQUEST 

The DELETE removes a row (tuple) or rows (tuples) from a 
table (relation). It has the general form: 

BELCETE FROM R 

WHERE B® b ,..-,B@ b 
where R is the name of the relation, B is an attribute of 
Ehe relation, p is a relational operator, and b is a con- 
Stant. The WHERE clause for the DELETE has the same options 
available that are in the SELECT. An example of a DELETE is 
as follows. 

DELETE FROM Employee 

WHERE Name = Smith 
This deletes all rows from the Employee table where the name 


1s equal to Smith. 
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Pee tHE UPDATE REQUEST 

The UPDATE command changes the attribute values’ Stored 
in the database. It has the general form: 

UPDATE R 

SET A = @ ygeerwzyA = A 

WHERE B@ b ,...,BQ b 
where R is the relation name, A is the attribute to be as- 
Signed a new value,a . The WHERE clause has options as pre- 
viously discussed. An example of an UPDATE request 1S _ as 
follows. 

UPDATE Employee 


20000 


SET Salary 


WHERE Name = Smith 


This update results in all employees named Smith having 


their salaries set at 20000. 
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V. THE MAPPINGS FROM SQL TO THE MDBS QUERY LANGUAGE 

The idea of a SQL interface to MDBS is to provide to the 
user a friendly interface. SQL was chosen as the query 
language because of its English-like syntax and the 
existence of a wide-spread community of SQL uSers. We must 
emphasize here that we are implementing an interface between 
the SQL users and MDBS. We are not adding functionality to 
MDBS. 

The distinction between implementing an interface and 
adding functionality is important for the following reason. 
SQL is a relational query language. The primary operations 
SrmeowL are SELECT, UPDATE, INSERT, and DELETE. The special 
relational operations, projection and join, are included in 
SQL, as well as aggregate operations, ordering, and various 
Set operations. SQL is uSually supported by ae relational 
database management system which implements all of these 
relational operations. 

MDBS, however, iS not based on the relational model. 
The data model of the MDBS machine is the attribute-based 
model. The attribute-based model is flexible, and can sup- 
port relational data structures: relations, tuples, and at- 
tributes [Ref. 17]. However, the functionality of MDBS does 


not encompass all relational operations. The four primary 
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operations of the MDBS machine are RETRIEVE, UPDATE, INSERT, 
emae DELETE. The aggregate operations are also supported. 
MDBS does not Support the join and ordering operations. Nor 
does it Support set operations. 

From the discussion above, it is clear that the set of 
SQL operations to be included in our interface will be lim- 
ited to those supported by the functionality of MDBS. The 
subset of SQL operations which 6an be slipported by MDBS 
directly is formally specified in Appendix 8B. In the 
remainder of this chapter, we define the mappings from the 
subset of SQL which can be Supported directly by the primary 
operations of MDBS. We present the mappings both in graph- 
ics and in text. In the next section we explain the graphic 
notations. The remaining sections of this chapter give the 


details of the mappings from SQL to the MDBS query language. 


A. GRAPHIC NOTATION 

We will show the mappings graphically, and also explain 
them in text. The graphic notation is illustrated in Figure 
4. The general forms of the SQL and MDBS queries used here 
have been developed in Chapters IV and V. The mappings are 
represented by directional arrows, and symbols’ indicating 
the type of the mapping. We have identified two types of 


Mappings: syntactic substitution and conversion. 
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Syntace lcC—Substitutlon mappings require only simple sub- 


stitution of syntactical terms. The symbol for this type of 
mapping is a square marked with tthe letter S. Figure 4 
shows two examples of a syntactic substitution mapping. The 
first example maps the SQL SELECT term to the MDBS RETRIEVE 
term. The second example maps the SOL semen tol isite tO athe 
Mmeeeetarget list. This example illustrates that a syntactic 
Substitution may be a direct copy of clauses from the SQL 
query to the MDBS query. 

Conversion mappings combine a clause from a SQL query 
with information about the MDBS data structure to derive the 
Clause of the MDBS query. The symbol for conversion mapping 
is a triangle marked with the letter C. In Figure 4, the 
mapping of the FROM and WHERE clauses of the SQL query into 
the query clause of the MDBS request is a conversion 
mapping. 

In Section B, we present the overall structure of the 
mappings from SQL queries to MDBS queries. In Sections C, 
D, and E, we discuss individually the three conversion 


mappings identified in Section B. 


B. MAPPING REQUESTS FROM SQL TO MDBS 
In this section, we show the syntactic~substitution 


mappings for the general forms of the SQL SELECT, UPDATE, 
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SELECT 
selnwexprolist 


FROM tableinate 





(WHERE rFoolean]) 


LGROUP BY fieldiname) 


RETRIEVE 


query 


targetelist 


[BY sttrreutre) 


Figure 4, FMarreing the SGL SELECT to tne MDBES RETPIEVE 
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INSERT, and DELETE requests into the MDBS RETRIEVE, UPDATE, 
INSERT, and DELETE requests, respectively. The conversion 
mappings are explained in detail in Subsequent sections. 
The first mapping is the SQL SELECT request to the 
MDBS RETRIEVE request. The SELECT query has the general 
form: 
SELECT sel_expr_list FROM table name 

[WHERE boolean] 

[GROUP BY field name] 
The RETRIEVE request has the general form: 

PERREEVE query target list 

(BY attribute] 
The SELECT to RETRIEVE mapping has been shown in Figure 4. 
The reserved word RETRIEVE is substituted for the reserved 
memdeonlbClT., The sel expr list is a list of attributes that 
the user wishes to access from the database, and directly 
corresponds to the MDBS target list. Consequently, it can 
Simply be copied into the MDBS request. The "FROM table name 
[WHERE boolean]" portion of the SQL request requires a 
conversion mapping into the "query" portion of the MDBS 
language. This conversion will be discussed in Section C. 
The reserved words 'GROUP BY' of SQL are directly translated 


into the MDBS reserved word, BY. The attribute upon. which 
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the grouping is to take place is copied from the SQL query 
to the MDBS request. 

Figure 5 illustrates the mapping required for the in- 
sert requests. The general form for the SQL INSERT request 
is: 

INSERT INTO table name VALUES insert spec 
The MDBS INSERT request's form is: 

PYSERT record. 
The reserved word INSERT is the same for the two requests. 
The remaining portion of the SQL request, ‘INTO table name 
VALUES insert spec", requires a conversion mapping into’ the 
record portion of the MDBS query. This conversion will be 


explained in Section D. 
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INSERT INTO takleiname VALUES insertuspec 


INSERT recerdad 


Bel mee PAreime etme SGEPrTRecEKT to tne MDES INSERT, 
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The mapping for the delete requests is shown in Fig- 

ure 6. The delete request in SQL has the general form: 

DELETE FROM table _ name [WHERE boolean] 
While in MBDS the general form is: 

DELETE query. 
The reserved word DELETE is common to both requests. The 
conversion of the "FROM table name [WHERE boolean]" portion 
of the SQL request into the "query" portion of the MDBS- re- 
quest is the same as that required in the SELECT request, 
and will be discussed in Section C. 

Figure 7 depicts the mapping for the update request. 

The general form for the update request in SQL is: 

BeATE table mame set clause list 

[WHERE boolean] 

In MDBS the form is: 

UPDATE query modifier. 
The SQL reserved word UPDATE is simply copied into MDBS. 
Bie = table name [WHERE boolean]™ conversion mapping is like 
that used in the SELECT and DELETE requests and will be ex- 
meeimed in Section C. The set _ clause list of SQL requires a 


conversion mapping in order to match to the modifier 
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- 


DELETE FROM tablecname CwHERE boolean) 


DELETE query 


Prot nmemobEeMdretmgu tne oat CELETE to the MOSS DELETE, 
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UPCATE 


tarleiname 





set.clause.wlist 


(WHERE poolean]j 


UFDATE query mocifier 


Figure 7, Marring the SGL UFLATE to the MDES UFLATE, 
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portion of the MDBS request. That mapping is explained in 


section E. 


C. THE CONVERSION MAPPING TO THE MDBS QUERY 

The select, delete and update requests of SQL all have a 
"FROM table name [WHERE boolean]" portion. In the update 
request it varies slightly in that the reserved word FROM is 
not used. However, the converSion required is essentially 
the same. This portion of the SQL request maps into’ the 
"query" portion of the MDBS retrieve, delete and update re- 
quests. However, due to the variety of forms and constructs 
available in SQL, a conversion is required to reconstruct 
this portion into an acceptable MDBS format. 

As illustrated in Figure 8, much of the converSion re- 
quires only a simple mapping. The specification of the MDBS 
query requires that the first attribute-value relationship 
be "FILE = attribute", where the attribute is the name of a 
iMeeemee tn1S iS equivalent to the SQL, "FROM table name”. 

In addition MDBS requires that queries be composed in 
the disjunctive normal form. SQL does not have this res- 
triction. This is demonstrated in the examples below, where 
the SOL "(WHERE boolean]" clause is mapped into a 
disjunction of conjunctions in the MDBS request. To explain 


the conversions required to convert SQL's "boolean" into an 
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FROM table.name {[wHERE boolean) 


File = attricute *MDBRS guery form 


Pieyre §. Macpeoing tS tr® MLBS Query. 
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acceptable MDBS "query", examples will be used. In each 
case a SQL request will be shown, followed by the 


corresponding request in MDBS. 


Example 1. Obtain the names of the employees that 


have a salary of 10000 and are clerks. 


SELECT Name 
FROM Emp 


WHERE Sal = 10000 AND Job = Clerk 


RETRIEVE 
((File = Emp) A 
(Sal = 10000) AV (Job = Clerk)) 


<Name> 
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Example 2. Obtain the names of employees who have 


a salary between 5000 and 10000. 


SELECT Name 
FROM Emp 


WHERE Sal BETWEEN 5000 AND 10000 


RETRIEVE 
((File = Emp) A 
(sal > —= S000) 7\ (Sal <= 10000) )) 


<Name> 
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Example 3. Obtain the names of employees who are 


employeed as a clerk, analyst or manager. 


SELECT Name 
FROM Emp 


WHERE Job IN (Clerk, Analyst, Manager) 


RETRIEVE 
(((File = Emp) A (Job = Clerk)) Y 
((File = Emp) A. (Job = Analyst)) VY 
((File = Emp) /\ (Job = Manager))) 


<Name> 


As seen in example 3 above, the reconstruction of the SQL 
request into acceptable MDBS disjunctive normal form re- 
quires the identification of the file attribute in each 


predicate. 
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Pee the CONVERSION MAPPING TO THE MDBS RECORD 

Bebo sinSere ~request uses “INTO table name VALUES 
meeteemspec" to identify the relation and attribute values 
that are to be inserted as a record. This corresponds’7 to 
the "record" portion of the MDBS insert request. The MDBS 
record is a Series of attribute-value pairs. The first pair 
is the file name (ex. <File,Emp>). This corresponds to 
pee@es INTO table name which identifies the relation name. 
peur e 9 illustrates this mapping. 

Biles insert Spec” portion of the SQL insert request is a 
listing of the values to be inserted in the relation. The 
ordering of the values must be identical to the ordering of 
the attributes in the relation, and all attributes must have 
an assigned value. MDBS, on the other hand, represents a 
record as a list of attribute-value pairs. There is no re- 
quirement for ordering of the attribute-value pairs, as 
values are matched with attributes. Nor does MDBS require 
that values be assigned to all attributes. Instead MDBS as- 
Signs default values of zeros or spaces for integer and 
character attribute types. 

In order to implement the SQL insert request, the MDBS 
record template information will have to be made available 
to the interface. The attribute names in the- record 


template are ordered. The attribute names in the template 
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INTO tatrlecnare 
ealje attribute 


VALUES insertuspec 


Merge with attrirnute 
names from tenplate, 


Pure ms aeeeceime, ce the MDPS record, 





can then be matched to the values listed in the SQL insert 
request to form the attribute-value pairs of the MDBS 
record. For example, the following SQL inSert request 

PNGERT INTO Emp VALUES Smith, Clerk, 10000 
would be converted to read 

INSERT (<File,Emp>, <Name,Smith>, 

“Top y,clerk>, <cal, L0000>) . 

Alternatives for implementing this conversion will be 


further discussed in Chapter 6. 


E. THE CONVERSION MAPPING INTO THE MDBS MODIFIER 

Rie ssset Clause list™ of the SOL update request has a 
direct correlation to the "modifier" of the MDBS update re- 
quest. Figure 10 illustrates thiS mapping. SOE has 
constructs to represent the first four types of MDBS modif- 
lers. The TYPE-0 modification sets the new value of the at- 
tribute being modified to a constant. The TYPE-I modifica- 
tion obtains the new value of an attribute being modified by 
Setting it to some function of the old value. The TYPE-II 
modification sets the value of the attribute being modified 
to some function of another attribute contained within the 
Same record. The TYPE-III modifier derives the value of the 
attribute being modified from some function of an attribute 


contained within another record. SQL has no construct which 
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SET filelduname = exer 


attribute .beinauinoditied PCES modifier ferm 


Pilcume  10gmererpine te the “DeS modifier. 





corresponds to a TYPE-IV modification, which derives the new 
value of the attribute being modified from a function of 
another attribute value in another record identified by the 
pointer in the modifier. 

SQL offers a wide variety of constructs for its "“expr" 
in the set clause list. In the examples below, we illustrate 
the correspondence between these constructs and the MDBS 
modifiers. The conversion required is a _ reordering or 
rewriting of these constructs into acceptable MDBS' format. 
The conversion is much like that used in the query mapping 
of Section B. The following examples illustrate the conver- 
Sions that are required. For simplicity, the examples are 
Singular updates. The SQL request will be presented first, 


followed by the corresponding MDBS request. 
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Example 4. Set the salary of all employees named 
Sle to Lol. 
(Ex. MDBS Type~0O modification) 
UPDATE Emp 
SET Salary = 10000 
WHERE Name = Smith 
UPDATE ((File = Emp) A 
(Name = Smith) ) 
<Salary = 10000> 
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Example 5. Double the salary of all employees 
named Smith. 


(Ex. MDBS Type-~1 modification) 


UPDATE Emp 
SET Salary = 2 * Salary 


WHERE Name = Smith 


UPDATE ((File = Emp) A 
(Name = Smith) ) 


<Salary = 2 * Salary> 
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Example 6. Set the salary of all employees 
named Smith to eight times the value 
of their rank. 

(Ex. MDBS Type-II modification) 
UPDATE Emp 
SET Salary = 8 * Rank 
WHERE Name = Smith 
UPDATE ((File = Emp) A 
(Name = Smith)) 
<Salary = 8 * Rank> 
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Example 7. Set the salaries of the employees 
named Smith to that of a manager's, 
as recorded in the Positions file. 


(Ex. MDBS Type-III modification) 


UPDATE Emp 
SET Salary = (SELECT Salary 
FROM Positions 
WHERE Job = Manager) 


WHERE Name = Smith 


UPDATE ((File = Emp) A 
(Name = Smith) ) 
<Salary = Salary of (File = Positions) A 


(Job = Manager)> 
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VI. RECOMMENDATIONS FOR IMPLEMENTATION 


In Chapter V we demonstrated that a SQL-to-MDBS query 
language interface could be constructed for a subset of SQL. 
In this chapter, we will discuss the implementation issues. 
The first section deals with two areas of differences 
between the constructs of SQL and the MDBS query language. 
One is the MDBS requirement that queries be constructed in 
the disjunctive normal form. The other is the difference in 
the construct of the insert requests, as addressed in 
Chapter V, Section D. 

In the second section we give suggestions for expanding 
the capabilities of the SQL/MDBS interface to support some 
SQL constructs that MDBS does not directly support. These 
are constructs which can be mapped from a single SQL request 
into a series of MDBS requests. The third section discusses 
extending MDBS to support the join and sort operations. The 
ase Section of this chapter discusses the use of program 


development tools to aid in the actual implementation. 


A. SQL AND MDBS DIFFERENCES 
In order to effectively support the SQL~-to-MDBS inter- 
face, two differences in construct between the two languages 


have to be resolved. The first is the MDBS requirement that 
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all queries be written in the disjunctive normal form. The 
second is the form of the SQL insert request aS compared _ to 
the MDBS insSert request. 
iene Disjunctive Normal Form 

MDBS requires the query portion of the _ retrieve, 
delete, and update requestS to be written in disjunctive 
normal form. On the other hand most commercial versions of 
SQL do not place this restraint on the user. In order to 
Support this capability, the interface will be required to 
translate the free-form logical SQL statements into the dis- 
jJunctive normal form. In the very simple cases this is not 
an extraordinary burden. However, in any involved query the 
cost of translation could be expensive. For this reason, 
and to simplify construction of the interface, we believe 
that the user should be required to formulate his’ requests 
in the disjunctive normal form. This should not place a 
burden on the user Since typically most requests are of a 
Simple construction. 

pembircrerences in the Insert Request 

The syntax of the inSert request in SQL places a bur- 
den on the user to know the construction of the table into 
which he/she wishes to insert values. Each field must have 
an assigned value, and values must be listed in the order of 


the field names in the table definition. MDBS, on the other 
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hand, specifies the record to be inserted as a list of 
attribute-value pairs. The attribute-value pair is a direct 
assignment of value to the indicated attribute. There is no 
constraint on the ordering of the pairs. 

We recommend an enhancement for the SQL language  inter- 
face, a new syntax for the insert request. The revised gen- 
eral form would be 

INSERT INTO table name VALUES insert_values. 
The syntax for insert values would be 
insert_values := (field _name,insert_spec) 
| insert values, (field _name,insert spec) 
This change in syntax brings the SQL insert command _ into 
line with the attribute-value pair syntax of the MDBS query 
language. More importantly it is believed that this change 


will improve user-friendliness. 


B. EXPANDING THE FUNCTIONALITY OF THE INTERFACE 

There are some basic SQL constructs which, while not 
directly supported by MDBS, can be mapped into a series of 
MDBS requests. The most important of these is the nested 
select construct. Additionally, commercial implementations 


of SQL offer a variety of features for manipulating and 
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editing data in result relations. We discuss below how the 
SQL-to-MDBS interface can be extended to provide these 
features. 
PeoMmeetcrt Joins Through Nested Selects 

SQL has the capability to nest select requests, as 
discussed in Chapter IV. MDBS does not have this capabil- 
ity. The SQL syntax dictates that the innermost nested 
select be evaluated first. Ea eaTenon then proceeds out- 
ward. Translated into MDBS, this requires a succession of 
retrieve statements. The innermost select statement 
corresponds to the first retrieve request. The following is 
an example of a SQL request with a nested select, and a 
series of MDBS retrieve requests that obtain the same 


results. 
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Example. Obtain the names of the employees who have 


a salary equal to that of a manager. 


SELECT Name 

FROM Emp 

WHERE Sal = (SELECT Sal 
FROM Payroll 


WHERE Job = Manager) 


RETRIEVE ((File Payroll) A (Job = Manager)) <Sal> 


RETRIEVE ((File 


Emp) A. (Sal = Sal)) <Name> 


In the above example the Sal value obtained in the first 
MDBS request would be used as the Sal value in the second 
retrieve in order to obtain the Name. 

In order to implement this capability in the interface 
we recommend that a pre-preprocessor be written that 
exclusively looks for nested selects. The pre-preprocessor 
finds the innermost select and sends it to the preprocessor. 
The value(s) obtained from the operation would then be 
inserted into the query portion of the next level select. 


This outward operation would continue until the entire 
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request had been executed. Utilization of the 
pre-preprocessor allows the preprocessor to operate on 
Single select requests. 
2. Formatting Options 

SQL gives the user some options in the formatting of 
his/her output within the context of the select request. 
This includes creating new headings, indentations, and. pro- 
ducing columnar/tabular outputs. The following example 
changes the column name based on the Sal attribute to a more 


readable heading, "Salary". 


SELECT Name, Sal Salary 
FROM Emp 


WHERE Name = Smith 


In order to make MDBS more user-friendly and useful in the 
area of report generation and formatting, we recommend that 
a post-processor be implemented as part of the interface. 
The post-processor would be responsible for performing the 
format and output options. 
Eeeoatrithmetic Operations and Functions 

SQL affords the user the ability to specify arithmet- 
ic operations and functions on the values of the result re- 
lation. For example, the following select request creates a 


new column in the output called ‘'comm/sal', which is derived 
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from two existing attributes by dividing the comm attribute 


by the sal attribute. 


SELECT Name, Comm/Sal, Comm, Sal 
FROM Emp 


WHERE Job = Salesman 


The following is an example of a arithmetic function option 


ime SOL. 


SELECT Name, ROUND (Sal, 2) 
FROM Emp 


WHERE Job = Salesman 


This example rounds the value of Sal to two decimal places. 
These and similar operations, can be implemented in a 


post-processor. 


C. JOIN AND SORT OPERATIONS 

SQL and the relational data model Support join and sort 
operations. Currently MDBS does not support either. Imple- 
mentation of the nested select, as discussed in the previous 
section, would enable MDBS to Support implicit joins, i.e., 
nested select requests. As MDBS is still in development, 
further research is required into the feasibility and 


desirability of implementing these operations on MDBS. The 
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consideratons of costs versus the additional capability that 
would be provided by such an implementation is outside the 
Scope of this paper. However, if implemented, the effort to 
include the required SQL-to-MDBS translation in the 


interface would be minimal. 


D. TOOLS FOR ACTUAL IMPLEMENTATION 

_ We recommend that the actual implementation of the in- 
terface be done uSing Yacc (Ref. 19] and Lex [Ref. 20], pro- 
gramming tools developed at Bell Laboratories. They can _ be 
used to produce an interpreter which accepts SQL requests 
and outputs the translated MDBS request. 

Lex is a lexical analyzer generator, designed for lexi- 
cal processing of character input streams. The user sSup- 
plies the specifications for character String matching, Lex 
then produces a program in the programming language C, which 
recognizes regular expressions. Lex is generally used with 
Yacc to recognize and supply tokens. 

Yacc, an acronym for Yet Another Compiler-Compiler, is a 
general tool for imposing structure on the input to a com- 
puter program. The user prepares a specification of the in- 
put process, i.e., rules and actions. Yacc then generates a 
Program of functions to control the input process. Yacc 


calls the lexical analyzer (Lex) to supply tokens, and then 
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parses the supplied tokens according to the production 
rules. 

Utilization of Yacc and Lex has_ several advantages. 
First, the MDBS query parser was created uSing these tools. 
Therefore in the event of any required scanner/parser_ to 
scanner/parser communications, both would be operating in 
Slmilar environments. Second, both are written in C, which 
improves their transSportability. Third, Yacc and Lex are 


both well documented and are relatively easy to uSe. 
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VII. CONCLUSION 


MDBS uses the attribute-based data model. Records are 
composed of ordered pairs of the form (an attribute, its 
value). Descriptors, or indices, are defined for selected 
directory attributes. These descriptors are used to parti- 
tion the database into clusters. The clusters are distri- 
buted across the backends to take full advantage of parallel 
execution of requests. 

The MDBS user accesses the database using a simple, 
non-procedural query language. The language supports four 
different types of requests: retrieve, insert, delete, and 
update. The retrieve query is used to access, but not 
alter, the contents of the database. The insert and delete 
requests are used to add or remove records in the database. 
The update request modifies existing records of the 
database. 

SQL is a relational query language, designed for use 
with relational databases. Like the MDBS query language, it 
has four different types of requests: select, insert, 
delete, and update. Like the MDBS retrieve request, the 
select accesses, but does not alter, the contents of the da- 
tabase. The SQL insert, delete, and update requests perform 


operations similar to those of their MDBS counterparts. 
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However, unlike the MDBS query language, SQL offers a 
wamtety of options in the syntax of its requests. This 
variety enables SQL requests to be constructed with varying 
degrees of logical and syntactical complexity. 

In this thesis, we have identified the direct mappings 
from SQL queries into MDBS queries. These mappings can be 
directly supported in the SQL-to-MDBS interface. We have 
also identified those SQL constructs which have no direct 
mapping, but can be converted into a sequence of MDBS 
queries. Enhancements to the interface are proposed to sup- 
port these indirect mappings. Lastly, we have identified 
those SQL constructs for which no mapping exists. To sup- 
port these mappings, the functionality of MDBS must be aug- 
mented. Let us discuss each of these cases, identifying the 
contributions of this thesis and directions for further 


research. 


A. THE DIRECT MAPPINGS 

Some SQL queries can be directly mapped into MDBS_ re- 
quests. The retrieve, insert, delete, and update requests 
of MDBS have a direct functional correspondence to the SQL 
select, insert, delete, and update requests, respectively. 
There are three exceptions which require a degree of insight 


in order to perform a mapping. These are the mapping of the 
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"FROM table name [WHERE boolean]" portion of SQL into the 
"query" portion of MDBS, the mapping of the SQL "INTO 
table name VALUES insert spec" into the MDBS "record", and 
Eiememapping of the SOL “set clause list" into the MDBS 
maodiftier”. 

MDBS requires that the query portion of the request be 
written in disjunctive normal form. SQL, on the other hand, 
allows for free formatting of its logical constructs. ZO 
convert the SQL "FROM table name [WHERE boolean]" construct 
into acceptable MDBS "query" format requires translating the 
options contained in the SQL "boolean" into MDBS disjunctive 
normal form. The complexity of this translation is O(n*¥*n), 
where n is the number of predicates in the boolean expres- 
Sion. In order to limit the overhead of this translation, 
we recommend that the SQL-to-MDBS interface require the uSer 
to construct SQL qualifications in disjunctive normal form. 

Bie ssOL insert request uses ‘INTO table name VALUES 
insert spec" to identify the relation and to list the values 
to be inserted. This list of values must correspond in ord- 
er and type to the constructed relation. MDBS, on the other 
hand, uses attribute-value pairs for insert parameters. One 
Solution to this conversion is to have the SQL-to-MDBS 
interface provide to the user the MDBS record template for 


assignment of values. Another approach, which we recommend, 
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is to alter the syntax of SQL'S insert request to make it 
correspond to the attribute-value pair syntax of the MDBS 
"record". This eliminates the requirement that the user 
aie the exact structure of the relation definition. 

eetese set Clause list™ and MDBS°s “modifier” are used 
to identify the attributes to be changed as a result of an 
update request. In addition, they specify the type of up- 
date. With the exception of the TYPE-IV modification in 
MDBS, which uses a pointer, there is a direct correspondence 
between the two languages in the syntax of their update re- 
guests. The only conversion required is formatting’ the 
mecmmclaise Jist™ into one of the acceptable MDBS "modifier” 
types. 

These direct mappings have been fully explained in this 
thesis. Further research will involve implementing the in- 
terface. For implementation a lexical scanner and an inter- 
preter could be constructed using the Yacc and Lex 


programming tools. 
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B. ENHANCEMENTS TO SUPPORT FURTHER MAPPINGS 

There are several constructs which cannot be supported 
by direct mappings, but can be Supported by an enhanced in- 
Perrace. The first of these is the implicit join operation, 
implemented in SQL by the nested SELECT. A re-preprocessor 
can be constructed to convert the nested SELECTS into a 
Series of MDBS queries, and to control the iterative 
execution required. | 

Several options are available in commercial versions of 
SQL which are not supported in MDBS, such as arithmetic 
operations and functions, and output formatting. In order 
to implement these features a post-processor could be con- 
Structed. Further research will be required to design and 


analyze these pre- and post-processor functions. 


SeeOrSRATIONS FOR WHICH NO MAPPING EXISTS 

The SQL options that cannot be Supported by MDBS are re- 
lated to the relational join operation and to the sorting 
capability commonly found in relational systems. MDBS, 
which is not a relational system but an attribute-based sys- 
tem, does not support either the join or the Sort operation. 

In order to provide a fully-functional relational inter- 
face to MDBS, some provision must be made to implement these 


operations. There are two choices. First, the join and 
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“sort operations could be implemented in MDBS. Second, these 
operations could be preformed by additional software running 
on the host. Further research will be required to identify 


the costs and the tradeoffs of these two alternatives. 
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APPENDIX A: FORMAL SPECIFICATION OF DML FOR ATTRIBUTE-BASED 
LANGUAGE 


The following is the BNF for the attribute-based data 
manipulation language developed by Hsiao and Menon [Ref ]. 
Square brackets [ ] are used to indicate optional con- 
structs. 


Baeaicate  edineienoeie Te) op value 
attribute [=e ehianS Erling 
Beermeute being modified := attribute 
base attribute eC eee ec 
value 2:= string 

| number 

| float 
Congunct := (Predicate) 

| (Conjunct / Predicate) 
Query 2= Conjunct 

| Query / Conjunct 
Sec := AVG | MAX | MIN | SUM | COUNT 
ao C1 ocho Cait 1 OME) 
list >= attribute 

| Tist el 

| list,attribute 


| list,list el 


ieeeet list (List) 


<attribute,value> 


Peart val pair 


Bate record ae rom val spall r 
| Half record, Attrib val pair 
Record joe da tt erecord} 
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Pointer 


Modifier 


type-0 
type-I 
type-II 
Eype-Iit 
type-IV 


Request 


Insert 
Delete 
Update 


Retrieve 


Pemletter 


string 


Fenletter 


oe 


oe 


oe 


number 


type-0 
my pe! 

| type-II 
| type-III 
| type-IVv 


<attribute _being modified 
value> 


<attribute being modified 
exprl> 


<attribute being modified 
expr2> 


Getbowicemoe! ng modi £ied 
expr2 of Query> 


<attribute being modified 
expr2 of Pointer> 


LASere 

| Delete 

| Update 

| Retrieve 


INSERT Record 
DELETE Query 
UPDATE Query Modifier 
REIRIEVe Query Target list 
[BY Attribute] 
[WITH Pointer] 
Ce SmeCmnlbeels s | 2 


ice le tier 
Moen inguuceletter 
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char String 


digit 


number 


float 
add_op 
mult op 


exprl 


fein) Lermi 


aieeen L£actorl 


Sar 2 


aie term2Z 


Mame Lactor2 


Hem beit ey 
| char string le letter 


Clee 2 lee 


argit 
| digit number 


number.number 


arith terml 
Wexner wadd-Op arith termi 


aetehuractors 
Poa eiecerml mult lop 
arith factorl 


attribute being modified 
| number 


ate ede Bile 
| expr2 add_op arith term2 


atime aciton 2 
| arith _term2 mult op 
arith factor2 


Dasemat tr oulte 
| number 


or 


APPENDIX B: 


FORMAL SPECIFICATION OF DML FOR SQL MAPPING 


The following is the BNF for the SQL query language to 


MDBS query language mapping. 


Square brackets [ }] are-used 


to indicate optional constructs. 


dml statement 


insertion 
Pisett Spec 
deletion 
update 


waeme Clause 


Semmeciause list 
eeteciause 
selection 


query block 


senect clause 


ee 


ee 


ee 


ee 


selection; 
| insertion; 
| deletion; 
| update; 


INSERT INTO table name VALUES 
ae se hemes ere 


selection 
| literal 


DELETE FROM table name 
Beets me allise | 


UPDATE table Biame Ser ie emeliicie latte 
{f where _clause } 


WHERE boolean 


Scene hase 

| set clause list , set clause 
SET £16€1d name = expr 

| SET field name = ( selection ) 


query block 
| ( selection ) 


select clause FROM table _name 
[ WHERE boolean ] 
[ GROUP BY attribute ]} 


plgel woe expr 11st 
IPSEEEGIE® 9 
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eelwexpr list 


sel_expr 


boolean 


boolean _term 


boolean factor 


predicate 


Bee table spec 


Baore Spec 


expr 


aren term 


Gieieh factor 


primary 


comparison 


Berk Op 


add_op 


tl 


e¢ 


See pr 
| sel_expr_list , sel _expr 


Evetaename 
| stat ( field name ) 


boolean_term 
| boolean OR boolean _term 


boolean factor 
| boolean_term AND boolean factor 


[ NOT ] boolean_primary 


attribute comparison value 

| attribute BETWEEN value AND value 

| attribute NOT BETWEEN value AND 
value 

| attribute comparison table spec 

| table spec comparison 
BOlwwedo Le Spec 


table spec 


| value 


( selection ) 


query block 
| 
| ( literal ) 


ae lenmcenrm 
| expr add_op arith term 


aise nmetractor 
Mosieimectihiamule op arith factor 


primary 


value 
| ( expr ) 


rel_op 

| IN 

| NOT IN 

i On OS eo 


+1 - 
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mult op 
Stat 


literal 


micmcuple List 


Pic cuple 
table name 
field name 
attribute 


value 


Char string 


string 


fom etter 
hemhetter 


number 


Float 


gigi t 


rma ef. 
AVG | MAX | MIN | SUM | COUNT 


ite meus Se 
| lit tuple 


lit tuple 
| lit tuple list , lit tuple 


value 
attribute 
attribute 
char string 
string 

| number 


| float 


UGwke Ct er 
| char_string lc_letter 


ucwletter 
Pee Gmuc meter 


aici 
| number digit 


number . number 


Cee 2 lec. | OD 


Si5 
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